Spontaneous speech recognition using a massively parallel decoder
نویسندگان
چکیده
Since spontaneous utterances include many variations, speakerand task-independent general models do not work well. This paper proposes combining cluster-based language and acoustic models based on the framework of Massively Parallel Decoder (MPD). The MPD is a parallel decoder that has a large number of decoding units, in which each unit is assigned to each combination of element models. It runs efficiently on a parallel computer, and thus the turnaround time is comparable to conventional decoders using a single model and a processor. In the experiments conducted using lecture speeches from the Corpus of Spontaneous Japanese, two types of cluster models have been investigated: lecture-based cluster models and utterancebased cluster models. It has been confirmed that utterancebased cluster models give significantly lower recognition error rate than lecture-based cluster models in both language and acoustic modeling. It has also been shown that roughly 100 decoding units are enough in terms of recognition rate, and in the best setting, 12% reduction in word error rate was obtained in comparison with the conventional decoder.
منابع مشابه
Speech Summarization using Weighte
This paper proposes an integrated framework to summarize spontaneous speech into written-style compact sentences. Most current speech recognition systems attempt to transcribe whole spoken words correctly. However, recognition results of spontaneous speech are usually difficult to understand, even if the recognition is perfect, because spontaneous speech includes redundant information, and its ...
متن کاملSelected topics from 40 years of research on speech and speaker recognition
This paper summarizes my 40 years of research on speech and speaker recognition, focusing on selected topics that I have investigated at NTT Laboratories, Bell Laboratories and Tokyo Institute of Technology with my colleagues and students. These topics include: the importance of spectral dynamics in speech perception; speaker recognition methods using statistical features, cepstral features, an...
متن کاملAn optimized multi-duration HMM for spontaneous speech recognition
In spontaneous speech, various speech style and speed changes can be observed, which are known to degrade speech recognition accuracy. In this paper, we describe an optimized multi-duration HMM (OMD). An OMD is a kind of multi-path HMM with at most two parallel paths. Each path is trained using speech samples with short or long phoneme duration. The thresholds to divide samples of phonemes are ...
متن کاملAn Assessment of Automatic Recognition Techniques for Spontaneous Speech in Comparison with Human Performance
To investigate problems of spontaneous speech recognition using N-grams and HMMs and estimate the room for improvement in the recognition rate, an automatic speech recognizer is evaluated in comparison with performances by human listeners. The evaluation task is to recognize spontaneous speech presentations from the Corpus of Spontaneous Japanese. Both the automatic recognizer and human listene...
متن کاملArticulatory Features and Associated Production Models in Statistical Speech Recognition
A statistical approach to speech recognition is outlined which draws close parallel with closed-loop human speech communication schematized as a joint process of encoding and decoding of linguistic messages. The encoder consists of the symbolically-valued overlapping articulatory feature model and of its interface to a nonlinear task-dynamic model of speech production. A general speech recogniz...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004